Use of Prosodic Features in Speech Recognition

نویسندگان

  • Keikichi Hirose
  • Kouji Iwano
  • Atsuhiro Sakurai
چکیده

Two methods were proposed for the use of prosodic features in speech recognition: one to detect major syntactic (phrase) boundaries as the initial phase of speech recognition, and the other to check the feasibility of the results of ordinary recognition process from the viewpoint of prosodic features. In the rst method, fundamental frequency contours were assumed as waveforms as functions of time and were low-pass ltered to suppress accent components in the contours. Then the derivative of ltered contour was used to detect phrase boundaries. An experiment was conducted on the ATR continuous speech database, showing that the method managed to detect about 77% of manually detectable phrase boundaries. The second method is based on generating fundamental frequency contours for recognition candidates using a speech synthesis scheme and comparing them with the observed contour. The candidate giving the best matched contour to the observed contour should be the nal recognition result. The method was shown to be valid in detecting recognition errors accompanied by changes in accent types or/and in syntactic boundaries. The method was then evaluated in its performance for the detection of phrase boundaries. Allowing 1-mora discrepancies, the detection rate reached 92% for the ATR database, which was further improved to 97% by a simple speaker adaptation method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Study on Detection of Prosodic Phrase Boundaries in Spontaneous Speech

Prosodic information, which has the abilities of disambiguation, improving the parsing of the spoken language and predicting recognition errors, becomes more and more important in speech recognition and understanding, especially in spontaneous speech. In this paper, we investigate the detection of the phrase boundaries by prosodic features in the domain-specified Chinese spontaneous speech. The...

متن کامل

A Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information

Language users benefit from special phonetic tools in order to communicate linguistic information as well as different emotional aspects and paralinguistic information through daily conversation. Having functions in conveying semantic information to listeners, prosodic features form the essential part of linguistic behavour, manipulating  them potentially can play an important role in transmitt...

متن کامل

Noise Robust Speech Recognition Using Prosodic Information

This paper proposes a noise robust speech recognition method for Japanese utterances using prosodic information. In Japanese, the fundamental frequency (F0) contour conveys phrase intonation and word accent information. Consequently, it also conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using the Hough transform, whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007